An interpretable data-driven prediction model to anticipate scoliosis in spinal muscular atrophy in the era of (gene-) therapies

5q-spinal muscular atrophy (SMA) is a neuromuscular disorder (NMD) that has become one of the first 5% treatable rare diseases. The efficacy of new SMA therapies is creating a dynamic SMA patient landscape, where disease progression and scoliosis development play a central role, however, remain difficult to anticipate. New approaches to anticipate disease progression and associated sequelae will be needed to continuously provide these patients the best standard of care. Here we developed an interpretable machine learning (ML) model that can function as an assistive tool in the anticipation of SMA-associated scoliosis based on disease progression markers. We collected longitudinal data from 86 genetically confirmed SMA patients. We selected six features routinely assessed over time to train a random forest classifier. The model achieved a mean accuracy of 0.77 (SD 0.2) and an average ROC AUC of 0.85 (SD 0.17). For class 1 ‘scoliosis’ the average precision was 0.84 (SD 0.11), recall 0.89 (SD 0.22), F1-score of 0.85 (SD 0.17), respectively. Our trained model could predict scoliosis using selected disease progression markers and was consistent with the radiological measurements. During post validation, the model could predict scoliosis in patients who were unseen during training. We also demonstrate that rare disease data sets can be wrangled to build predictive ML models. Interpretable ML models can function as assistive tools in a changing disease landscape and have the potential to democratize expertise that is otherwise clustered at specialized centers.


Data export and preparation
The SMA data set was fully updated and exported in May 2023.The descriptive statistics of the data set and visualization was performed using pandas, Matplotlib and Seaborn packages [17][18][19] implemented in Python (version 3.11).The processed data set was segmented into subsets of routine SMA visits, SMA therapy administrations and spine examinations.We performed expert domain-knowledge directed feature engineering.In summary, the variables related to a feature were aggregated and converted to numeric using current domain-knowledge of the feature.The features used for training in this study include 'orthosis' , 'ventilation' , and 'contractures' (see also Appendix).

Feature engineering
To overcome the challenge of training a machine learning model on a small data set, we engineered features to improve the predictive power of individual features.To do this, we used expert domain knowledge from clinicians specializing in treating SMA patients and the current literature.Contractures are common sequelae in SMA patients.The contractures are routinely measured by physicians during routine visits and the anatomical location (feet, knees, hips, elbows, wrists) and severity (none, mild, moderate, severe) is documented in the patient's EHR.According to the current literature, contractures in the lower extremities occur early during SMA disease progression and have a significant impact on the patient's functional capabilities.To calculate the score, we ranked the location of contractures by their significance for SMA disease progression and exponentiated it with the severity of the contracture.To engineer the orthosis score, we similarly ranked the location of the orthosis by their correlation with SMA severity and converted that representation to numeric.This allowed us to numerically encode whether a patient had one or a combination of multiple different orthosis types, which indirectly correlate with SMA severity.For the ventilation score, the ventilation type (none, non-invasive, invasive), frequency of use (daily, occasional, or when ill only), and duration of use (intermittent, night, day, or continuous) were used to calculate a score that represented the intensity of the patients's ventilation requirements.

Feature importance testing and feature selection
For univariate feature importance analysis we performed correlation and predictive power score analysis 20 .To minimize the risk of data leakage during training, we removed all variables related to the spine.For multivariate feature importance analysis, we performed feature importance testing using mean decrease impurity analysis, permutation feature importance and Shapley Additive exPlanations (SHAP).We selected six SMA progression markers from different clinical domains to train a Random Forest Classifier (RFC).These progression markers included 'age at assessment' , 'CHOP-INTEND score' , 'contractures score' , 'HFMSE score' , 'orthoses score' and the 'ventilation score' .

Results
The raw data set contained 1230 visits from 86 genetically confirmed 5q-SMA patients.The processed data frame consisted of 695 visits from 86 SMA patients.Scoliosis labels were available for 59 patients, resulting in a labeled train/test data set of 333 total visits.Patients without scoliosis labels (n = 27) corresponded with 150 visits (see also Fig. 1).

Individual features alone are poor predictors of scoliosis
For univariate feature analysis we plotted the correlation coefficients and predictive power scores in a heatmap (Fig. 2a,b).When a patient transitioned from being evaluated using the CHOP-INTEND score to HFMSE score, both scores were strongly correlated (r = 0.88) (Fig. 2a).We observed weak correlation between the CHOP-INTEND and HFMSE score with the contractures score (r = 0.27 and 0.44, respectively).There was a moderate correlation between the ventilation score and the HFMSE and CHOP-INTEND score, (r = 0.48 and 0.51, respectively).The predictive power score showed that the CHOP-INTEND and HFMSE scores were moderately predictive of each other (PPS = 0.60 and 0.64).The HFMSE score, routinely used for patients 2 years and older 22 , predicted the age at assessment.Results of our multivariate feature analysis are shown in Fig. 2c,d.The SHAP value analysis showed that high values of HFMSE scores negatively impacted the model's prediction for class 1 'scoliosis' .Increasing age at assessment positively impacted the prediction for class 1.In turn, low contracture scores negatively impacted class 1 prediction.The mean decrease impurity analysis over multiple folds showed relatively high feature importance for the HFMSE and CHOP-INTEND (Fig. 2).To test the predictive power of collective features over individual features, we compared ROC AUC curves (Supplemental Fig. 2), and observed that training on collective features improved model performance.

Prediction of scoliosis based on collective clinical features
The model achieved a mean accuracy of 0.77 (SD 0.2) and an average ROC AUC of 0.85 (SD 0.17).For class 1 'scoliosis' the average precision was 0.84 (SD 0.11), recall 0.89 (SD 0.22), F1-score of 0.85 (SD 0.17), respectively.The average Matthews correlation coefficient (MCC) was 0.40 (Supplemental Fig. 3b).The confusion matrices of ten stratified group K-folds are plotted in the Supplemental Fig. 1a.Detailed performance metrics for each stratified group K-fold can be found in Supplemental Fig. 3b.The results suggested variability across different stratified group K-folds.To get a more robust assessment of the model's performance, we performed ten stratified group K-folds using different random states and calculated the average ROC AUC.The average ROC AUC across ten random states of grouped K-fold cross validation runs was 0.84 (SD 0.007) (Fig. 3).

Predicted scoliosis probability
Using the learnt RFC model, we plotted the model's predicted scoliosis probability for each patient.Here we could observe the changes in the model's predicted probability of scoliosis over time, plotted as age of the patient in months (Fig. 4).Note that the Cobb angle was not used in the model's training.When we plotted the model's predicted probability of scoliosis and compared it with measured Cobb angles in spine radiographs, we could retrace the inflection point that marked the model's predicted time point of scoliosis onset (Fig. 4).

Model validation on 'unknown scoliosis' data set
We further tested our trained model on the unlabeled subset of the SMA data set and performed follow-up examinations of the patients to verify the model's prediction.Some patients had either relocated or were deceased, so the predicted labels could not be verified for some.We were able to attain follow-up examinations from ten patients and the model had predicted the scoliosis label correctly for 9 of them.Figure 5 shows examples of the model's predicted scoliosis probability and the results of our follow-up examinations.

Discussion
SMA is one of the first monogenic neuromuscular disorders to receive gene therapy treatment.The efficacy of new SMA-therapies is changing the landscape of patient phenotypes.Early treatment administration improves outcomes 23 , which is why SMA is now implemented in a growing number of newborn screening programs worldwide 2 .In addition, more effective modes for the administration of SMA therapies are currently being explored (e.g., in utero therapy) 24 .These advances in SMA treatment will create a dynamic SMA patient landscape, in which the anticipation of SMA disease progression and associated sequelae will guide decisions to provide the best standard of care.We developed an interpretable data-driven ML model that can predict scoliosis based on SMA progression markers, which can function as an assistive tool during interdisciplinary treatment evaluations.It is important to note that the current model has been trained to predict the likelihood that an SMA patient has scoliosis at a certain visit based on routinely assessed clinical features.This model has not been trained to forecast the presence of scoliosis at a future visit.

Random forest classifier as an assistive tool
The aim of this study was to develop an assistive tool for physicians involved in the treatment of SMA patients that can help clinicians anticipate scoliosis onset.The model must be robust and interpretable for future adoption of the tool.Random Forest Classifiers have repeatedly demonstrated high accuracy and robustness in classification tasks.In addition, RFCs can capture nonlinear relationships between features and their target (here, 'scoliosis').By using different feature importance calculation methods, visualizing decision trees, and plotting the model's predicted probabilities over time, the physician can interpret and interact with the model's output and make decisions based on the presented predictions.How this technology will be used in the future will greatly depend on its integration with electronic health data collection and its adoption by physicians.Currently, clinical features of SMA patients are routinely assessed in the SMArtCare study.Such an algorithm could perform an automated scoliosis prediction based on routine feature inputs and suggest referrals to pediatric spine specialists, who can then use the model's prediction with the child's feature development to derive individualized treatment plans and schedule time points for reassessments and consultation.

SMA data set statistics and representation
The training data set contained relatively balanced proportions of SMA types 1, 2, and 3 (31.23%,44.14%, 24.62%, respectively).The gender distribution of 48.19% females to 51.81% males aligns with previously reported SMA statistics.The natural history of SMA is associated with scoliosis development in up to 60-95% of patients, depending on SMA types 25,26 .The training data set contained 72.7% scoliosis, which was representative of the average prevalence of scoliosis in SMA 25 .Overall, the distribution between classes of 'scoliosis' and 'no scoliosis' was moderately imbalanced, so we chose to train a Random Forest Classifier, which is known for its robustness even in severe class imbalances.Notably, most patients in this data set were born in Germany (72•0.09%), one of the first countries to approve SMA therapies and implement SMA in newborn screenings 1 .

Scoliosis labels and scoliosis prediction
The diagnosis of scoliosis is defined by a Cobb angle of > 10° (± 3°) on an anteroposterior spine radiograph, which may be debatable even among experts.Interestingly, some of the model's predicted scoliosis probabilities reflected the uncertainty related to the diagnostic interval.For patient 53, the model predicted 'scoliosis' with a probability of only 41% (Fig. 4).When we compared the predicted probability with the corresponding spine radiograph, the measured Cobb angles were 7° and 13°, respectively, representing scoliosis onset.The current model was trained as a binary scoliosis classifier and therefore, cannot predict actual Cobb angles.Such a prediction model would require more complex models and larger data sets that include multiple spine radiographs taken in children with SMA at multiple time points.Ethically, this data set does not exist at the moment.Other approaches to address these obstacles could include data augmentation approaches in the future.However, it is remarkable that the model's predicted scoliosis probability could reflect a diagnostic uncertainty in the physician's spine radiograph.Future performance testing of the model should include an external validation data set.In addition, the model will likely improve when retrained with additional data.

Feature engineering and feature selection
This model was designed to provide robust predictions regardless of baseline biomarkers, which is why we deliberately selected routinely assessed SMA progression markers for training.Excluding baseline SMA disease markers such as SMA type, SMN2 copy numbers, motor milestones, first symptom onset, and SMA therapy, reduced our model's performance metrics but, in our opinion, was optimized toward our model's purpose and future application.Importantly, our focus on disease progression markers during model training is in line with the ongoing update of SMA classification 27 .To improve training on a rare disease data set, some features were engineered and presented as numeric using domain knowledge, which was based on the current evidence reported in the literature.Improved understanding of the impact of different features on SMA disease progression and scoliosis development, could further improve the feature engineering process and potentially, model performance in the future.Nevertheless, RFCs have demonstrated robustness in the context of different data types, including categorical and numerical.

Limitations and bias in rare disease data sets
SMA is a rare disease.Data-driven models strongly depend on the data that they are trained on.Despite rigorous data preparation and feature selection to ensure meaningful data representation, potential biases in the training set may limit its current deployment.An external validation is needed in the next step.To the best of our knowledge, the data set used for the purpose of this study is one of the largest and most comprehensive SMA data sets to date.The robustness of the model should improve when it is retrained on a larger data set in the future.Potential solutions to mitigate current limitations may include using generative methods for data augmentation 28 .However, these methods require extensive evaluation before application for the purpose of clinical support tools.We were cautious in using such methods, as our current understanding of the new SMA patient landscape is still limited.Our predictive model, which utilizes SMA disease progression markers, could augment clinical decision-making in the future by offering quantified probabilities of scoliosis development, providing actionable insights that extend beyond the capabilities of traditional assessments.These insights would support targeted early intervention strategies and personalized care in neuromuscular disease management in the context of gene therapy.

Conclusion
SMA will likely not be the last NMD to undergo a drug-induced alteration of the natural course of disease.New approaches to anticipate disease associated sequelae and progression will be needed to continuously provide the best standard of care.We demonstrate that rare disease data sets can be wrangled to build predictive ML models.These models harness the compounded observations that normally train a clinician's intuition.Our trained model could predict scoliosis using selected disease progression markers, namely 'age at assessment' , 'HFMSE score' , 'CHOP-INTEND score' , 'ventilation score' , and 'orthoses score' and 'contractures score' , and scoliosis prediction correlated with radiological Cobb angle measurements.More importantly, the model could help clinicians anticipate scoliosis in patients who were unseen during model training.Such ML models can function as assistive tools during interdisciplinary patient evaluations and augment expertise in an era of disease-modifying therapies.In addition, it has the potential to democratize expertise that is otherwise clustered at specialized centers.Future work entails giving access to and validating our model on a data set from an international clinic.

Figure 1 .
Figure 1.(a) Data processing flow in the predictive modeling process.1230visits of 86 genetically confirmed SMA patients were cleaned and merged on time.The 333 visits were labeled with corresponding scoliosis labels from available spine examinations for supervised training of a RandomForestClassifier (RFC).(b) Schematic visualization of data subsets for training and validation.The model's predictions were tested on visits without scoliosis labels and patient subsets where the scoliosis was unknown.(c)Table 1 summarizes the demographics and features in the training and testing data set used during model development.Table 2 summarizes the demographics of the validation data set used to validate the model after training.

Figure 3 .Figure 4 .
Figure 3. Training and performance evaluation of a binary Random Forest Classifier for scoliosis prediction.(a) Cross validation using StratifiedGroupKFold for ten folds (b) Average Receiver Operating Characteristics (ROC) for ten cross-validation folds of ten different random states (c) exemplary decision tree.

Figure 5 .
Figure 5. Model validation on the 'unknown scoliosis' data set.The predicted scoliosis probabilities for patients 3, 39, 33, 43, 73, and 84 are plotted on the left graph.The corresponding follow-up examinations of the spine are depicted on the right side of the graph.
Table 1 summarizes the demographics and features in the training and testing data set used during model development.Table 2 summarizes the demographics of the validation data set used to validate the model after training.mean age of patients in the validation data set was 40.92 months (SD 53.95), and the SMN2 copy numbers ranged from 2 to 4. 2 patients had 51 (34.5%)SMN2 copy numbers, 73 (49.3%) had 3 copy numbers and 24 (16.2%) had 4 copy numbers (Table 2 in Fig. 1c).The mean HFMSE score was 46.33 (SD 13.91), and the CHOP-INTEND score was 50.13 (SD 13.31).The mean age at first SMA disease onset was 34.48 months (SD 56.07) and the mean age at first SMA therapy was 36.37 months (SD 51.45) (Table